Feature Extraction from Speech and Phoneme-recognition Using
ثبت نشده
چکیده
Speech is a time varying signal. What makes it more interesting is that the information contained in the signal is very difficult to analyze. Traditional methods of speech analysis use Short Time Fourier Transform and Mel Frequency Cepstral Coefficients (MFCC) to extract the feature out of a speech signal, and the model has been successfully implemented in many speech recognition machines. Their performance varies depending upon the context and the results are promising only when the speech signals are of short duration. After coming across the theory of wavelets I learnt that wavelet transform has better Time-Frequency Localization property than STFT [5]. This project report is an excursion into one of the promising application of wavelets, i.e. speech recognition. Here I applied the concept of wavelet transform for feature extraction out of speech signals. To start with I have taken speech signals of small duration, all of my speech signals are phonemes (fundamental unit of speech). To measure the performance of feature extraction I fed the processed speech signal vectors to a neural network for classification. For classification I have used the standard Self Organizing Feature Map (SOFM) [11] which is an unsupervised learning method. The results I found are interesting and require further experimentation which will be continued in future.
منابع مشابه
بهبود عملکرد سیستم بازشناسی گفتار پیوسته بوسیله ویژگیهای استخراج شده از مانیفولدهای گفتاری در فضای بازسازی شده فاز
The design for new feature extraction methods out of the speech signal and combination of their obtained information is one of the most effective approaches to improve the performance of automatic speech recognition (ASR) system. Recent researches have been shown that the speech signal contains nonlinear and chaotic properties, but the effects of these properties are not used in the continuous ...
متن کاملImproving Phoneme Sequence Recognition using Phoneme Duration Information in DNN-HSMM
Improving phoneme recognition has attracted the attention of many researchers due to its applications in various fields of speech processing. Recent research achievements show that using deep neural network (DNN) in speech recognition systems significantly improves the performance of these systems. There are two phases in DNN-based phoneme recognition systems including training and testing. Mos...
متن کاملPhoneme Classification Using Temporal Tracking of Speech Clusters in Spectro-temporal Domain
This article presents a new feature extraction technique based on the temporal tracking of clusters in spectro-temporal features space. In the proposed method, auditory cortical outputs were clustered. The attributes of speech clusters were extracted as secondary features. However, the shape and position of speech clusters change during the time. The clusters temporally tracked and temporal tra...
متن کاملAllophone-based acoustic modeling for Persian phoneme recognition
Phoneme recognition is one of the fundamental phases of automatic speech recognition. Coarticulation which refers to the integration of sounds, is one of the important obstacles in phoneme recognition. In other words, each phone is influenced and changed by the characteristics of its neighbor phones, and coarticulation is responsible for most of these changes. The idea of modeling the effects o...
متن کاملA three-dimensional approach to Visual Speech Recognition using Discrete Cosine Transforms
Visual speech recognition aims to identify the sequence of phonemes from continuous speech. Unlike the traditional approach of using 2D image feature extraction methods to derive features of each video frame separately, this paper proposes a new approach using a 3D (spatio-temporal) Discrete Cosine Transform to extract features of each feasible sub-sequence of an input video which are subsequen...
متن کاملEstimating phoneme class conditional probabilities from raw speech signal using convolutional neural networks
In hybrid hidden Markov model/artificial neural networks (HMM/ANN) automatic speech recognition (ASR) system, the phoneme class conditional probabilities are estimated by first extracting acoustic features from the speech signal based on prior knowledge such as, speech perception or/and speech production knowledge, and, then modeling the acoustic features with an ANN. Recent advances in machine...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2011